• Topological Node2vec: Enhanced Graph Embedding via Persistent Homology

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Topological Node2vec : Enhanced Graph Embedding via Persistent Homology Yasuaki Hiraoka , Yusuke Imoto , Théo Lacombe , Killian Meehan , Toshiaki Yachimura 25(134 1 26, 2024. Abstract Node2vec is a graph embedding method that learns a vector representation for each node of a weighted graph while seeking to preserve relative proximity and global structure . Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph . To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram PD of

  • Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length Katerina Hlaváčková-Schindler , Anna Melnykova , Irene Tubikanec 25(133 1 26, 2024. Abstract Multivariate Hawkes processes MHPs are versatile probabilistic tools used to model various real-life phenomena : earthquakes , operations on stock markets , neuronal activity , virus propagation and many others . In this paper , we focus on MHPs with exponential decay kernels and estimate connectivity graphs , which represent the Granger causal relations between their components . We approach this inference problem by

  • A General Framework for the Analysis of Kernel-based Tests

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A General Framework for the Analysis of Kernel-based Tests Tamara Fernández , Nicolás Rivera 25(95 1 40, 2024. Abstract Kernel-based tests provide a simple yet effective framework that uses the theory of reproducing kernel Hilbert spaces to design non-parametric testing procedures . In this paper , we propose new theoretical tools that can be used to study the asymptotic behaviour of kernel-based tests in various data scenarios and in different testing problems . Unlike current approaches , our methods avoid working with U and V-statistics expansions that usually lead to lengthy and tedious

  • Causal-learn: Causal Discovery in Python

    Updated: 2024-05-18 19:32:04
    Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe causal-learn, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, causal-learn is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn.

  • Random Forest Weighted Local Fr{{\'e}}chet Regression with Random Objects

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Random Forest Weighted Local Fréchet Regression with Random Objects Rui Qiu , Zhou Yu , Ruoqing Zhu 25(107 1 69, 2024. Abstract Statistical analysis is increasingly confronted with complex data from metric spaces . Petersen and Müller 2019 established a general paradigm of Fréchet regression with complex metric space valued responses and Euclidean predictors . However , the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality . To address this issue , we in this paper propose a novel random forest weighted local Fréchet regression paradigm . The

  • Linear Distance Metric Learning with Noisy Labels

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Linear Distance Metric Learning with Noisy Labels Meysam Alishahi , Anna Little , Jeff M . Phillips 25(121 1 53, 2024. Abstract In linear distance metric learning , we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible . In this paper , we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem , and for different noise models we derive the corresponding loss functions . We show that even if the data is noisy

  • Tangential Wasserstein Projections

    Updated: 2024-05-18 19:32:04
    We develop a notion of projections between sets of probability measures using the geometric properties of the $2$-Wasserstein space. In contrast to existing methods, it is designed for multivariate probability measures that need not be regular, and is computationally efficient to implement via regression. The idea is to work on tangent cones of the Wasserstein space using generalized geodesics. Its structure and computational properties make the method applicable in a variety of settings where probability measures need not be regular, from causal inference to the analysis of object data. An application to estimating causal effects yields a generalization of the synthetic controls method for systems with general heterogeneity described via multivariate probability measures.

  • Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces Rui Wang , Yuesheng Xu , Mingsong Yan 25(93 1 45, 2024. Abstract Sparsity of a learning solution is a desirable feature in machine learning . Certain reproducing kernel Banach spaces RKBSs are appropriate hypothesis spaces for sparse learning methods . The goal of this paper is to understand what kind of RKBSs can promote sparsity for learning solutions . We consider two typical learning models in an RKBS : the minimum norm interpolation MNI problem and the regularization problem . We first establish an explicit representer

  • ptwt - The PyTorch Wavelet Toolbox

    Updated: 2024-05-18 19:32:04
    The fast wavelet transform is an essential workhorse in signal processing. Wavelets are local in the spatial- or temporal- and the frequency-domain. This property enables frequency domain analysis while preserving some spatiotemporal information. Until recently, wavelets rarely appeared in the machine learning literature. We provide the PyTorch Wavelet Toolbox to make wavelet methods more accessible to the deep learning community. Our PyTorch Wavelet Toolbox is well documented. A pip package is installable with `pip install ptwt`.

  • Representation Learning via Manifold Flattening and Reconstruction

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Representation Learning via Manifold Flattening and Reconstruction Michael Psenka , Druv Pai , Vishal Raman , Shankar Sastry , Yi Ma 25(132 1 47, 2024. Abstract A common assumption for real-world , learnable data is its possession of some low-dimensional structure , and one way to formalize this structure is through the manifold hypothesis : that learnable data lies near some low-dimensional manifold . Deep learning architectures often have a compressive autoencoder component , where data is mapped to a lower-dimensional latent space , but often many architecture design choices are done by hand ,

  • Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees Nachuan Xiao , Xiaoyin Hu , Xin Liu , Kim-Chuan Toh 25(48 1 53, 2024. Abstract In this paper , we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization , especially in the training of nonsmooth neural networks . We introduce a novel two-timescale framework that adopts a two-timescale updating scheme , and prove its convergence properties under mild assumptions . Our proposed framework encompasses various popular Adam-family methods , providing convergence guarantees for

  • Unsupervised Anomaly Detection Algorithms on Real-world Data: How Many Do We Need?

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Unsupervised Anomaly Detection Algorithms on Real-world Data : How Many Do We Need Roel Bouman , Zaharah Bukhsh , Tom Heskes 25(105 1 34, 2024. Abstract In this study we evaluate 33 unsupervised anomaly detection algorithms on 52 real-world multivariate tabular data sets , performing the largest comparison of unsupervised anomaly detection algorithms to date . On this collection of data sets , the EIF Extended Isolation Forest algorithm significantly outperforms the most other algorithms . Visualizing and then clustering the relative performance of the considered algorithms on all data sets , we

  • OpenBox: A Python Toolkit for Generalized Black-box Optimization

    Updated: 2024-05-18 19:32:04
    Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning. However, users still face challenges when applying BBO methods to their problems at hand with existing software packages in terms of applicability, performance, and efficiency. This paper presents OpenBox, an open-source BBO toolkit with improved usability. It implements user-friendly interfaces and visualization for users to define and manage their tasks. The modular design behind OpenBox facilitates its flexible deployment in existing systems. Experimental results demonstrate the effectiveness and efficiency of OpenBox over existing systems. The source code of OpenBox is available at https://github.com/PKU-DAIR/open-box.

  • Bagging Provides Assumption-free Stability

    Updated: 2024-05-18 19:32:04
    Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constant. Empirical results validate our findings, showing that bagging successfully stabilizes even highly unstable base algorithms.

  • Data Thinning for Convolution-Closed Distributions

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Data Thinning for Convolution-Closed Distributions Anna Neufeld , Ameer Dharamshi , Lucy L . Gao , Daniela Witten 25(57 1 35, 2024. Abstract We propose data thinning , an approach for splitting an observation into two or more independent parts that sum to the original observation , and that follow the same distribution as the original observation , up to a known scaling of a parameter . This very general proposal is applicable to any convolution-closed distribution , a class that includes the Gaussian , Poisson , negative binomial , gamma , and binomial distributions , among others . Data thinning

  • Efficient Modality Selection in Multimodal Learning

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Efficient Modality Selection in Multimodal Learning Yifei He , Runxiang Cheng , Gargi Balasubramaniam , Yao-Hung Hubert Tsai , Han Zhao 25(47 1 39, 2024. Abstract Multimodal learning aims to learn from data of different modalities by fusing information from heterogeneous sources . Although it is beneficial to learn from more modalities , it is often infeasible to use all available modalities under limited computational resources . Modeling with all available modalities can also be inefficient and unnecessary when information across input modalities overlaps . In this paper , we study the modality

  • Exploration of the Search Space of Gaussian Graphical Models for Paired Data

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Exploration of the Search Space of Gaussian Graphical Models for Paired Data Alberto Roverato , Dung Ngoc Nguyen 25(92 1 41, 2024. Abstract We consider the problem of learning a Gaussian graphical model in the case where the observations come from two dependent groups sharing the same variables . We focus on a family of coloured Gaussian graphical models specifically suited for the paired data problem . Commonly , graphical models are ordered by the submodel relationship so that the search space is a lattice , called the model inclusion lattice . We introduce a novel order between models , named the

  • A Multilabel Classification Framework for Approximate Nearest Neighbor Search

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A Multilabel Classification Framework for Approximate Nearest Neighbor Search Ville Hyvönen , Elias Jääsaari , Teemu Roos 25(46 1 51, 2024. Abstract To learn partition-based index structures for approximate nearest neighbor ANN search , both supervised and unsupervised machine learning algorithms have been used . Existing supervised algorithms select all the points that belong to the same partition element as the query point as nearest neighbor candidates . Consequently , they formulate the learning task as finding a partition in which the nearest neighbors of a query point belong to the same

  • Predictive Inference with Weak Supervision

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Predictive Inference with Weak Supervision Maxime Cauchois , Suyash Gupta , Alnur Ali , John C . Duchi 25(118 1 45, 2024. Abstract The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive , though it is not always apparent how to leverage such data for model fitting or validation . We present a methodology to bridge the gap between partial supervision and validation , developing a conformal prediction framework to provide valid predictive confidence sets---sets that cover a true label with a prescribed probability , independent of

  • Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data Vasilii Feofanov , Emilie Devijver , Massih-Reza Amini 25(104 1 47, 2024. Abstract In this paper , we propose a probabilistic framework for analyzing a multi-class majority vote classifier in the case where training data is partially labeled . First , we derive a multi-class transductive bound over the risk of the majority vote classifier , which is based on the classifier's vote distribution over each class . Then , we introduce a mislabeling error model to analyze the error of the majority vote classifier in

  • Multiple Descent in the Multiple Random Feature Model

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Multiple Descent in the Multiple Random Feature Model Xuran Meng , Jianfeng Yao , Yuan Cao 25(44 1 49, 2024. Abstract Recent works have demonstrated a double descent phenomenon in over-parameterized learning . Although this phenomenon has been investigated by recent works , it has not been fully understood in theory . In this paper , we investigate the multiple descent phenomenon in a class of multi-component prediction models . We first consider a double random feature model DRFM concatenating two types of random features , and study the excess risk achieved by the DRFM in ridge regression . We

  • Differentially Private Data Release for Mixed-type Data via Latent Factor Models

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Differentially Private Data Release for Mixed-type Data via Latent Factor Models Yanqing Zhang , Qi Xu , Niansheng Tang , Annie Qu 25(116 1 37, 2024. Abstract Differential privacy is a particular data privacy-preserving technology which enables synthetic data or statistical analysis results to be released with a minimum disclosure of private information from individual records . The tradeoff between privacy-preserving and utility guarantee is always a challenge for differential privacy technology , especially for synthetic data generation . In this paper , we propose a differentially private data

  • The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective

    Updated: 2024-05-18 19:32:04
    , : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us The good , the bad and the ugly sides of data augmentation : An implicit spectral regularization perspective Chi-Heng Lin , Chiraag Kaushik , Eva L . Dyer , Vidya Muthukumar 25(91 1 85, 2024. Abstract Data augmentation DA is a powerful workhorse for bolstering performance in modern machine learning . Specific augmentations like translations and scaling in computer vision are traditionally believed to improve generalization by generating new artificial data from the same distribution . However , this traditional viewpoint does not explain the success of prevalent augmentations in modern machine

  • Mathematical Framework for Online Social Media Auditing

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Mathematical Framework for Online Social Media Auditing Wasim Huleihel , Yehonathan Refael 25(64 1 40, 2024. Abstract Social media platforms SMPs leverage algorithmic filtering AF as a means of selecting the content that constitutes a user's feed with the aim of maximizing their rewards . Selectively choosing the contents to be shown on the user's feed may yield a certain extent of influence , either minor or major , on the user's decision-making , compared to what it would have been under a natural fair content selection . As we have witnessed over the past decade , algorithmic filtering can cause

  • The Non-Overlapping Statistical Approximation to Overlapping Group Lasso

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us The Non-Overlapping Statistical Approximation to Overlapping Group Lasso Mingyu Qi , Tianxi Li 25(115 1 70, 2024. Abstract The group lasso penalty is widely used to introduce structured sparsity in statistical learning , characterized by its ability to eliminate predefined groups of parameters automatically . However , when the groups overlap , solving the group lasso problem can be time-consuming in high-dimensional settings due to groups’ non-separability . This computational challenge has limited the applicability of the overlapping group lasso penalty in cutting-edge areas , such as gene pathway

  • Functional Directed Acyclic Graphs

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Functional Directed Acyclic Graphs Kuang-Yao Lee , Lexin Li , Bing Li 25(78 1 48, 2024. Abstract In this article , we introduce a new method to estimate a directed acyclic graph DAG from multivariate functional data . We build on the notion of faithfulness that relates a DAG with a set of conditional independences among the random functions . We develop two linear operators , the conditional covariance operator and the partial correlation operator , to characterize and evaluate the conditional independence . Based on these operators , we adapt and extend the PC-algorithm to estimate the functional

  • Information Processing Equalities and the Information–Risk Bridge

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Information Processing Equalities and the Information–Risk Bridge Robert C . Williamson , Zac Cranko 25(103 1 53, 2024. Abstract We introduce two new classes of measures of information for statistical experiments which generalise and subsume φ-divergences , integral probability metrics , N-distances MMD and f,Γ divergences between two or more distributions . This enables us to derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem , thus extending the variational φ-divergence representation to multiple distributions in an entirely

  • Invariant and Equivariant Reynolds Networks

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Invariant and Equivariant Reynolds Networks Akiyoshi Sannai , Makoto Kawano , Wataru Kumagai 25(42 1 36, 2024. Abstract Various data exhibit symmetry , including permutations in graphs and point clouds . Machine learning methods that utilize this symmetry have achieved considerable success . In this study , we explore learning models for data exhibiting group symmetry . Our focus is on transforming deep neural networks using Reynolds operators , which average over the group to convert a function into an invariant or equivariant form . While learning methods based on Reynolds operators are

  • Unlabeled Principal Component Analysis and Matrix Completion

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Unlabeled Principal Component Analysis and Matrix Completion Yunzhen Yao , Liangzu Peng , Manolis C . Tsakiris 25(77 1 38, 2024. Abstract We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations , termed Unlabeled Principal Component Analysis UPCA Using algebraic geometry , we establish that UPCA is a well-defined algebraic problem since we prove that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix , arising as the unique solutions of a polynomial system

  • Personalized PCA: Decoupling Shared and Unique Features

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Personalized PCA : Decoupling Shared and Unique Features Naichen Shi , Raed Al Kontar 25(41 1 82, 2024. Abstract In this paper , we tackle a significant challenge in PCA : heterogeneity . When data are collected from different sources with heterogeneous trends while still sharing some congruency , it is critical to extract shared knowledge while retaining the unique features of each source . To this end , we propose personalized PCA PerPCA which uses mutually orthogonal global and local principal components to encode both unique and shared features . We show that , under mild conditions , both

  • Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Polygonal Unadjusted Langevin Algorithms : Creating stable and efficient adaptive algorithms for neural networks Dong-Young Lim , Sotirios Sabanis 25(53 1 52, 2024. Abstract We present a new class of Langevin-based algorithms , which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models . Its underpinning theory relies on recent advances of Euler-Krylov polygonal approximations for stochastic differential equations SDEs with monotone coefficients . As a result , it inherits the stability properties of tamed

  • Nonparametric Regression for 3D Point Cloud Learning

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Regression for 3D Point Cloud Learning Xinyi Li , Shan Yu , Yueying Wang , Guannan Wang , Li Wang , Ming-Jun Lai 25(102 1 56, 2024. Abstract In recent years , there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas . Motivated by the importance of solid modeling for point clouds , we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud . The proposed method can denoise or deblur the point cloud

  • Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Survival Kernets : Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee George H . Chen 25(40 1 78, 2024. Abstract Kernel survival analysis models estimate individual survival distributions with the help of a kernel function , which measures the similarity between any two data points . Such a kernel function can be learned using deep kernel survival models . In this paper , we present a new deep kernel survival model called a survival kernet , which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis .

  • AMLB: an AutoML Benchmark

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us AMLB : an AutoML Benchmark Pieter Gijsbers , Marcos L . P . Bueno , Stefan Coors , Erin LeDell , Sébastien Poirier , Janek Thomas , Bernd Bischl , Joaquin Vanschoren 25(101 1 65, 2024. Abstract Comparing different AutoML frameworks is notoriously challenging and often done incorrectly . We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks . We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks . The differences between the AutoML frameworks are explored with

  • Spatial meshing for general Bayesian multivariate models

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spatial meshing for general Bayesian multivariate models Michele Peruzzi , David B . Dunson 25(87 1 49, 2024. Abstract Quantifying spatial and or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model , but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process GP in the increasingly common large scale data settings on which we focus . The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to

  • Differentially private methods for managing model uncertainty in linear regression

    Updated: 2024-05-18 19:32:04
    In this article, we propose differentially private methods for hypothesis testing, model averaging, and model selection for normal linear models. We propose Bayesian methods based on mixtures of $g$-priors and non-Bayesian methods based on likelihood-ratio statistics and information criteria. The procedures are asymptotically consistent and straightforward to implement with existing software. We focus on practical issues such as adjusting critical values so that hypothesis tests have adequate type I error rates and quantifying the uncertainty introduced by the privacy-ensuring mechanisms.

  • Semi-supervised Inference for Block-wise Missing Data without Imputation

    Updated: 2024-05-18 19:32:04
    We consider statistical inference for single or low-dimensional parameters in a high-dimensional linear model under a semi-supervised setting, wherein the data are a combination of a labelled block-wise missing data set of a relatively small size and a large unlabelled data set. The proposed method utilises both labelled and unlabelled data without any imputation or removal of the missing observations. The asymptotic properties of the estimator are established under regularity conditions. Hypothesis testing for low-dimensional coefficients are also studied. Extensive simulations are conducted to examine the theoretical results. The method is evaluated on the Alzheimer’s Disease Neuroimaging Initiative data.

  • Spectral learning of multivariate extremes

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spectral learning of multivariate extremes Marco Avella Medina , Richard A Davis , Gennady Samorodnitsky 25(124 1 36, 2024. Abstract We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes . More specifically , we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory . Our work studies the theoretical performance of spectral clustering based on a random k$-nearest neighbor graph constructed from an extremal sample , i.e . the angular part of random vectors for which the

  • Optimal First-Order Algorithms as a Function of Inequalities

    Updated: 2024-05-18 19:32:04
    In this work, we present a novel algorithm design methodology that finds the optimal algorithm as a function of inequalities. Specifically, we restrict convergence analyses of algorithms to use a prespecified subset of inequalities, rather than utilizing all true inequalities, and find the optimal algorithm subject to this restriction. This methodology allows us to design algorithms with certain desired characteristics. As concrete demonstrations of this methodology, we find new state-of-the-art accelerated first-order gradient methods using randomized coordinate updates and backtracking line searches.

  • A Semi-parametric Estimation of Personalized Dose-response Function Using Instrumental Variables

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A Semi-parametric Estimation of Personalized Dose-response Function Using Instrumental Variables Wei Luo , Yeying Zhu , Xuekui Zhang , Lin Lin 25(86 1 38, 2024. Abstract In the application of instrumental variable analysis that conducts causal inference in the presence of unmeasured confounding , invalid instrumental variables and weak instrumental variables often exist which complicate the analysis . In this paper , we propose a model-free dimension reduction procedure to select the invalid instrumental variables and refine them into lower-dimensional linear combinations . The procedure also

  • Data Summarization via Bilevel Optimization

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Data Summarization via Bilevel Optimization Zalán Borsos , Mojmír Mutný , Marco Tagliasacchi , Andreas Krause 25(73 1 53, 2024. Abstract The increasing availability of massive data sets poses various challenges for machine learning . Prominent among these is learning models under hardware or human resource constraints . In such resource-constrained settings , a simple yet powerful approach is operating on small subsets of the data . Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective . However , existing coreset constructions are highly

  • Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits Junpei Komiyama , Edouard Fouché , Junya Honda 25(112 1 56, 2024. Abstract We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time . We introduce the adaptive resetting bandit ADR-bandit a bandit algorithm class that leverages adaptive windowing techniques from literature on data streams . We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques , which are of independent interest . Furthermore , we conduct a finite-time analysis of

  • Sum-of-norms clustering does not separate nearby balls

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sum-of-norms clustering does not separate nearby balls Alexander Dunlap , Jean-Christophe Mourrat 25(123 1 40, 2024. Abstract Sum-of-norms clustering is a popular convexification of K$-means clustering . We show that , if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius , and if the balls are sufficiently close to one another , then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters . As the dimension tends to infinity , this

  • Sparse NMF with Archetypal Regularization: Computational and Robustness Properties

    Updated: 2024-05-18 19:32:04
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sparse NMF with Archetypal Regularization : Computational and Robustness Properties Kayhan Behdin , Rahul Mazumder 25(36 1 62, 2024. Abstract We consider the problem of sparse nonnegative matrix factorization NMF using archetypal regularization . The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties , arising from the use of archetypal regularization . We generalize the notion of robustness studied in Javadi and Montanari 2019 without sparsity to the notions of a strong robustness that implies

  • Scaling the Convex Barrier with Sparse Dual Algorithms

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Scaling the Convex Barrier with Sparse Dual Algorithms Alessandro De Palma , Harkirat Singh Behl , Rudy Bunel , Philip H.S . Torr , M . Pawan Kumar 25(61 1 51, 2024. Abstract Tight and efficient neural network bounding is crucial to the scaling of neural network verification systems . Many efficient bounding algorithms have been presented recently , but they are often too loose to verify more challenging properties . This is due to the weakness of the employed relaxation , which is usually a linear program of size linear in the number of neurons . While a tighter linear relaxation for

  • Learning Non-Gaussian Graphical Models via Hessian Scores and Triangular Transport

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Learning Non-Gaussian Graphical Models via Hessian Scores and Triangular Transport Ricardo Baptista , Rebecca Morrison , Olivier Zahm , Youssef Marzouk 25(85 1 46, 2024. Abstract Undirected probabilistic graphical models represent the conditional dependencies , or Markov properties , of a collection of random variables . Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference . While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions , most

  • Pareto Smoothed Importance Sampling

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Pareto Smoothed Importance Sampling Aki Vehtari , Daniel Simpson , Andrew Gelman , Yuling Yao , Jonah Gabry 25(72 1 58, 2024. Abstract Importance weighting is a general way to adjust Monte Carlo integration to account for draws from the wrong distribution , but the resulting estimate can be highly variable when the importance ratios have a heavy right tail . This routinely occurs when there are aspects of the target distribution that are not well captured by the approximating distribution , in which case more stable estimates can be obtained by modifying extreme importance ratios . We present a new

  • Resource-Efficient Neural Networks for Embedded Systems

    Updated: 2024-05-18 19:32:04
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Resource-Efficient Neural Networks for Embedded Systems Wolfgang Roth , Günther Schindler , Bernhard Klein , Robert Peharz , Sebastian Tschiatschek , Holger Fröning , Franz Pernkopf , Zoubin Ghahramani 25(50 1 51, 2024. Abstract While machine learning is traditionally a resource intensive task , embedded systems , autonomous navigation , and the vision of the Internet of Things fuel the interest in resource-efficient approaches . These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy . The development of such approaches is

  • Visualize This (2nd ed.): Finding the Best Visualization Tools

    Updated: 2024-05-18 02:37:33
    There are a lot of tools to visualize data. Some are visualization-specific. Some are tools that let you make charts but are focused on other data things. New apps come out with new features that promise new things. This can make it tricky to find the best visualization tool.Tags: writing

  • Stunning New Data Visualizations Not to Miss — DataViz Weekly

    Updated: 2024-05-17 16:25:04
    Data visualization is not just about making data look pretty; it’s about uncovering hidden patterns, revealing trends, and providing a clearer understanding of information. Through innovative visual techniques, we can explore data in ways that traditional methods can’t match. This week on DataViz Weekly, we bring you four compelling projects that showcase the transformative power […] The post Stunning New Data Visualizations Not to Miss — DataViz Weekly appeared first on AnyChart News.

  • ✚ Does the data make sense?

    Updated: 2024-05-16 20:26:22
    Membership Projects Courses Tutorials Newsletter Become a Member Log in Members Only Does the data make sense May 16, 2024 Topic The Process analysis error questions When you analyze data , there are times when a trend , pattern , or outlier jumps out and smacks you in the face . Or , you might calculate results that seem surprising . Maybe they’re real , but maybe not . Make sure know which before you go shouting your insights from the . rooftop To access this issue of The Process , you must be a . member If you are already a member , log in here See What You Get The Process is a weekly newsletter on how visualization tools , rules , and guidelines work in practice . I publish every Thursday . Get it in your inbox or read it on FlowingData . You also gain unlimited access to hundreds of

  • Map of magnetic fields in the Milky Way

    Updated: 2024-05-15 08:48:53
    Membership Projects Courses Tutorials Newsletter Become a Member Log in Maps magnetic fields Milky Way NASA Strange Maps Map of magnetic fields in the Milky Way May 15, 2024 Based on data from NASA’s Stratospheric Observatory For Infrared Astronomy SOFIA Villanova University researchers developed a map of the magnetic fields in the Milky Way For Strange Maps , Frank Jacobs The colors show the interaction between warmer dust clouds pink cooler ones blue and magnetic fields , indicated by radio filaments yellow — mysterious tendrils up to 150 light-years long . By revealing variations in the orientation of magnetic fields across dust clouds some with fanciful names like The Brick and Three Little Pigs this map offers a first glimpse at the complex arrangements of dust and magnetism in the

  • Communal Plot, a shared coordinate space to see how your taste compares

    Updated: 2024-05-14 09:14:29
    , Membership Projects Courses Tutorials Newsletter Become a Member Log in Communal Plot , a shared coordinate space to see how your taste compares May 14, 2024 Topic Statistical Visualization PerThirtySix sharing social PerThirtySix made a communal plot that asks for your opinion via scatterplot and you can see how you compare against the aggregates A new poll goes up every . day The inspiration for this comes from a whiteboard in an office I used to work at . Every so often , a new pair of questions would be posted and people would contribute their answers by marking where on the scatterplot they belonged . It was fun seeing how my answers compared to others , and guessing who might have answered where . I hope this tool brings you some of that fun Related Seeing just the questions

  • Let’s Connect at Qlik Connect 2024: AnyChart Booth #807

    Updated: 2024-05-13 20:08:25
    Qlik Connect 2024 is on the horizon, and we’re pleased to announce that AnyChart will be participating as an Emerald sponsor and exhibitor. Join us from June 3–5 at Rosen Shingle Creek in Orlando, Booth #807! Read more at qlik.anychart.com » The post Let’s Connect at Qlik Connect 2024: AnyChart Booth #807 appeared first on AnyChart News.

  • ✚ Staying in the Generative Loop

    Updated: 2024-05-09 18:30:41
    Membership Projects Courses Tutorials Newsletter Become a Member Log in Members Only Staying in the Generative Loop May 9, 2024 Topic The Process AI generative mashup Maybe one day AI tools will be advanced enough to process a random dataset and produce valuable insights that incorporate the context of the real world . But that day is not today . Today , we can play with the tools available to us and see what . happens I’m Nathan Yau . I am a real person for now . This is The Process the newsletter for FlowingData members that looks closer at how we visualize . data To access this issue of The Process , you must be a . member If you are already a member , log in here See What You Get The Process is a weekly newsletter on how visualization tools , rules , and guidelines work in practice . I

  • Exploring Insights with Data Visualization — DataViz Weekly

    Updated: 2024-05-03 17:18:22
    Welcome to the new DataViz Weekly, where we continue exploring the transformative power of data visualization. This edition presents a selection of new examples of how charts and maps can help us understand trends and patterns in various subjects — from sports and philanthropy to global challenges like press freedom and inflation, and everyday topics […] The post Exploring Insights with Data Visualization — DataViz Weekly appeared first on AnyChart News.

  • The Cool Grey City of Data: inside the San Francisco Chronicle’s data team

    Updated: 2024-04-29 21:24:11
    NEW PODCAST EPISODE: Dan Kopf⁠ and ⁠Nami Sumida⁠ join Simon and Alberto to discuss how the SF Chronicle tells data stories, such as Sumida’s ⁠recent exploration of the city’s Japantown⁠ (sub required) and the WW2 internment that nearly destroyed it. The team discuss what makes the Bay Area such a rich source of data journalism … Continue reading →

Previous Months Items

Apr 2024 | Mar 2024 | Feb 2024 | Jan 2024 | Dec 2023 | Nov 2023